knitr::opts_chunk$set(echo = TRUE)

Tasks

Task 1

Obtain the working directory.

getwd()

Task 2

Read the EPAGAS data into R.

epa <- read.csv("EPAGAS.csv")
head(epa)

Task 3

Use z-scores to determine outliers.

mpg <- epa$ï..MPG

(Note: This strange formatting does not appear in any text editor in the EPAGAS.csv file, so I cannot remove it.)

mpgz <- (mpg - mean(mpg)) / sd(mpg)
mean(mpgz)
sd(mpgz)

Possible outliers:

mpg[abs(mpgz) >= 2 & abs(mpgz) <= 3]

Outliers:

mpg[abs(mpgz) > 3]

Lattice dotplot:

library(lattice)
dotplot(~mpg, col=ifelse(abs(mpgz)>3, "red", ifelse(abs(mpgz)>=2, "blue", "black")))

Task 4

Use Chebyshev's Theorem and the Empirical Rule.

boxplot(mpg, horizontal=TRUE, notch=TRUE)

According to Chebyshev's Theorem, at least 3/4 (75%) of the data should be within 2 standard deviations of the mean.

length(mpg[abs(mpgz)<=2])/length(mpg)

Much more than 3/4 of the data (96%) is within 2 standard deviations of the mean, so Chebyshev's Theorem holds true.

According to the Empirical Rule, approximately 95% of the data should be within 2 standard deviations of the mean.

The Empirical Rule also holds true (more than 95% of the data is within 2 standard deviations of the mean).

The Empirical Rule is valid. As seen from the boxplot, the data is symmetric and tends toward the mean. (In other words, the first and fourth quartiles have much greater ranges than the second and third quartiles.)



draket333/MATH4753tayl0062 documentation built on Sept. 10, 2020, 11:49 a.m.